Conversation
Replaces DSN-embedded PRAGMAs with explicit db.Exec("PRAGMA ...") calls
so the recipe is driver-portable and self-documenting. Adds
PRAGMA synchronous=NORMAL — the WAL-safe default that cuts two fsyncs
per commit to one — which was silently missing before. Raises
MaxOpenConns from 1 to 4 (with MaxIdleConns=2, ConnMaxLifetime=1h) so
concurrent readers overlap under WAL; the writer remains serialized
by SQLite itself.
Adds (*Store).Ping(ctx context.Context) error on the public Store
surface so Block 4's /readyz can distinguish a cancelled caller from
a dead database. Covered by TestStore_PingContext.
Block 3.6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
langchaingo allocates a fresh net/http.Transport inside every provider constructor by default — a new idle-conn pool, TLS session cache, and DNS bucket per openai.New / ollama.New call. For a long-running server that talks to the same upstream on every request, that's pure waste and defeats keep-alive. Adds internal/llm/httpclient.go with newHTTPClient() returning a *http.Client backed by a tuned *http.Transport (MaxIdleConns=100, MaxIdleConnsPerHost=10, IdleConnTimeout=90s, TLSHandshakeTimeout=10s, ResponseHeaderTimeout=60s). Injects the same client into chat and embed langchaingo sub-clients via WithHTTPClient options so every provider uses exactly one pool. The pooled client's Timeout is deliberately 0 — per-call deadlines land in Task 3 (3.3) via ctx. Hard-cutting the client would truncate streaming embedding responses. Stores the client and provider-specific batchCeiling on lcProvider so Task 3 can layer ctx timeouts and Task 4 can enforce per-provider embedding batch limits. Block 3.5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every Complete / Embed / EmbedBatch call now wraps its parent ctx in context.WithTimeout(parent, cfg.LLM.CallTimeout). Default is 60s; callers can disable by setting llm.call_timeout=0, in which case the parent ctx's deadline is authoritative. The vendored langchaingo has no internal retry loop, so the spec's "retry counts against total timeout" clause is trivially satisfied by wrapping once at the provider entry point. Adds config.LLMConfig.CallTimeout + DOCSIQ_LLM_CALL_TIMEOUT env binding + a 60s default. Tests use stub Model/Embedder implementations that block on ctx.Done, proving DeadlineExceeded surfaces cleanly within 500ms when callTimeout=50ms. Block 3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each provider now declares a BatchCeiling via the Provider interface: OpenAI 2048, Azure 16 (deployment default), Ollama 128 (heuristic). lcProvider.EmbedBatch slices input to the ceiling and pushes per-chunk results through a buffered channel of size 2 (equivalent to 2*ceiling vector slots) — a slow consumer backpressures the producer once the buffer fills. Chunks are dispatched serially in the producer goroutine; multi-chunk concurrency remains the Embedder's responsibility. The Embedder's New() now clamps caller-supplied batchSize to the provider's ceiling, so "batch_size: 5000" in a config against an Azure deployment silently becomes 16 rather than failing with a provider-side 400. Also updates the three test fakes (nopProvider, mockProvider, FakeProvider) for the new interface method. Block 3.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds scripts/ctx-audit.sh — a static check that flags any HTTP or
DB call in the five ctx-required packages (internal/{llm,embedder,
extractor,crawler,store}) that bypasses ctx. HTTP: http.Get,
http.Post, client.Get, client.Post, http.DefaultClient,
http.NewRequest (non-ctx). DB: .Query, .Exec, .QueryRow (non-Context
variants), with an intentional carve-out for the open()/migrate()
PRAGMA+schema paths that run before a Store is constructed.
The initial audit surfaced three crawler helpers (parseSitemap,
discoverSitemap, extractLinks) that used client.Get(url) instead of
client.Do(http.NewRequestWithContext(ctx, ...)). A ctx.Done while
those were in flight would stall for up to the HTTP client's Timeout
(30s) waiting on the response. Fixed by threading ctx through every
level of the crawl.
The script intentionally scopes to I/O-call checks — signature-based
"exported funcs must take ctx first" in pure bash is brittle against
Go's return-tuple syntax. The Go compiler already enforces ctx
propagation at every call site once the interfaces demand it; the
script's value is the I/O side-channel check vet does not cover.
CI wiring of this script is deferred to a follow-up PR so this block
does not touch .github/workflows/ (owned by a parallel block).
Block 3.1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the router in http.TimeoutHandler(cfg.Server.RequestTimeout,
"request timeout") — default 30s — so a misbehaving handler cannot
hold a goroutine indefinitely. POST /api/upload and
POST /api/projects/{p}/import route through a longer-timeout branch
bounded by cfg.Server.UploadTimeout (default 10m) so large indexing
kickoffs aren't killed mid-stream.
Layering is deliberate:
securityHeadersMiddleware (outermost, Block 2)
requestTimeoutMiddleware (new, Block 3.2)
loggingMiddleware
recoveryMiddleware
bearerAuthMiddleware
projectMiddleware
mux
A 503 timeout response carries CSP/security headers; the timeout is
still logged.
Adds server.request_timeout + server.upload_timeout Viper keys with
env bindings (DOCSIQ_SERVER_REQUEST_TIMEOUT / _UPLOAD_TIMEOUT).
Block 3.2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
recoveryMiddleware now logs req_id, route, method, user (if authed), and the full debug.Stack() on panic — everything needed to reconstruct a request from the log alone, no stderr tail required. Backed by panic_enrichment_test.go which asserts on the attribute surface via a TextHandler buffer capture. cmd/serve.go: raises srv.Shutdown deadline from 10s to 30s so it matches the workq drain phase; splits the single shutdown start/end pair into per-phase log lines so operators can see exactly where a stuck shutdown is hung. Adds ctxUserKey in request_id.go — reserved for future auth integration; today only recoveryMiddleware reads it (empty string means unauthenticated). TestShutdown_NoGoroutineLeaks: adds database/sql.(*DB).connectionCleaner to the ignore list — Block 3.6's ConnMaxLifetime=1h spawns this stdlib goroutine which only exits on DB.Close(). Same post-Close window as the already-tolerated connectionOpener. Block 1 already shipped signal handling + workq.Close(drainCtx); this task closes the two remaining 3.7 gaps. Block 3.7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-aligns the BatchCeiling() accessor rows on lcProvider, nopProvider, and mockProvider to match gofmt's spacing. Limited to lines added in this branch; pre-existing trailing-whitespace and map-key alignment issues on main are left untouched per brownfield discipline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Block 3 ships seven landable commits that make docsiq resilient to misbehaving upstreams, slow disks, abrupt shutdowns, and runaway requests. Every outbound call now has a timeout; every HTTP provider shares a pooled client; SQLite runs with explicit PRAGMAs and a raised connection pool; EmbedBatch chunks against each provider's declared ceiling with backpressure; the HTTP surface is ringed by
http.TimeoutHandler; and the graceful-shutdown + panic-log paths carry the context an operator needs.Commits
feat(store)— SQLite hardening: explicit PRAGMAs (journal_mode=WAL,busy_timeout=5000,synchronous=NORMAL,foreign_keys=ON); raised pool (MaxOpenConns=4,MaxIdleConns=2,ConnMaxLifetime=1h);(*Store).Ping(ctx)for/readyz. [3.6]feat(llm)— One pooled*http.Clientper provider vianewHTTPClient()(MaxIdle 100, per-host 10, IdleTimeout 90s, TLSHandshake 10s, ResponseHeader 60s). [3.5]feat(llm)— Per-call timeout wrapping (default 60s viallm.call_timeout). [3.3]feat(llm,embedder)—Provider.BatchCeiling()(OpenAI 2048, Azure 16, Ollama 128);lcProvider.EmbedBatchchunks + backpressures via buffered channel;embedder.NewclampsbatchSizeto ceiling. [3.4]feat(ci,crawler)—scripts/ctx-audit.shstatic check; crawlerparseSitemap/discoverSitemap/extractLinksnow thread ctx throughhttp.NewRequestWithContext. [3.1]feat(api)—requestTimeoutMiddleware(cfg)wraps the router with 30s default / 10m upload override (POST /api/upload,POST /api/projects/{p}/import); sits insidesecurityHeadersMiddlewareand outsideloggingMiddleware. [3.2]feat(api,serve)— Panic log enriched withreq_id, route, method, user, fulldebug.Stack();srv.Shutdowndeadline raised from 10s to 30s; per-phase log lines (HTTP server shutting down,workq draining). [3.7]Test plan
CGO_ENABLED=1 go test -tags sqlite_fts5 -timeout 300s ./...— all pass (566 tests, 22 packages)CGO_ENABLED=1 go test -tags sqlite_fts5 -race -timeout 300s ./...— all pass, 0 data races (588 tests post-Block 3)CGO_ENABLED=1 go test -tags "sqlite_fts5 integration" -race -timeout 1200s ./...— all pass (after addingdatabase/sql.(*DB).connectionCleanerto the goleak ignore list — spawned byConnMaxLifetime=1h, exits onDB.Close())go vet -tags sqlite_fts5 ./...— cleanbash scripts/ctx-audit.sh—CTX AUDIT OKMigration / ops notes
llm.call_timeout(DOCSIQ_LLM_CALL_TIMEOUT),server.request_timeout(DOCSIQ_SERVER_REQUEST_TIMEOUT),server.upload_timeout(DOCSIQ_SERVER_UPLOAD_TIMEOUT).Providerinterface gainedBatchCeiling() int— all three test fakes (nopProvider,mockProvider,FakeProvider) updated.MaxOpenConnsraised from 1 to 4: SQLite WAL serializes writers internally; this lets readers overlap. Race detector clean.Follow-ups (out of scope)
scripts/ctx-audit.sh— deferred, does not touch.github/workflows/(owned by parallel block).internal/loader.LoadURL(url)does not yet accept ctx — outside the Block 3.1 audited packages; filed as follow-up.ctxUserKeyis defined but no middleware populates it today; wired in when session auth lands.Generated with Claude Code